SFTM: Fast matching of web pages using Similarity-based Flexible Tree Matching
نویسندگان
چکیده
Tree matching techniques have been investigated in many fields, including web data mining and extraction, as a key component to analyze the content of pages. However, when applied existing pages, traditional tree approaches, covered by algorithms like Tree-Edit Distance (TED) or XyDiff, either fail scale beyond few hundred nodes exhibit relatively low accuracy. In this article, we therefore propose novel algorithm, named Similarity-based Flexible Matching (SFTM), which enables high accuracy on real-life with practical computation times. We approach an optimization problem leverage node labels local topology similarity order avoid any combinatorial explosion. Our evaluation demonstrates that SFTM significantly improves state art terms accuracy, while allowing times lower than most accurate solutions. By gaining these two dimensions, offers affordable solution match complex trees practice.
منابع مشابه
Fast Image Matching on Web Pages
In this paper, a fast method for image matching on web pages is presented. Such method relies on performing cross correlation in the frequency domain between the web image and the image given in the user query. The cross correlation operation is modified. Instead of performing dot multiplication in the frequency domain, image subtraction is applied in two dimensions. It is proved mathematically...
متن کاملFast Least Square Matching
Least square matching (LSM) is one of the most accurate image matching methods in photogrammetry and remote sensing. The main disadvantage of the LSM is its high computational complexity due to large size of observation equations. To address this problem, in this paper a novel method, called fast least square matching (FLSM) is being presented. The main idea of the proposed FLSM is decreasing t...
متن کاملEvaluation of Similarity Measures for Template Matching
Image matching is a critical process in various photogrammetry, computer vision and remote sensing applications such as image registration, 3D model reconstruction, change detection, image fusion, pattern recognition, autonomous navigation, and digital elevation model (DEM) generation and orientation. The primary goal of the image matching process is to establish the correspondence between two ...
متن کاملFlexible Tree Matching
Tree-matching problems arise in many computational domains. The literature provides several methods for creating correspondences between labeled trees; however, by definition, tree-matching algorithms rigidly preserve ancestry. That is, once two nodes have been placed in correspondence, their descendants must be matched as well. We introduce flexible tree matching, which relaxes this rigid requ...
متن کاملFast and Flexible String Matching
The most important features of a string matching algorithm are its eeciency and its exibility. EEciency has traditionally received more attention, while exibility in the search pattern is becoming a more and more important issue. Most classical string matching algorithms are aimed at quickly nding an exact pattern in a text, being Knuth-Morris-Pratt (KMP) and the Boyer-Moore (BM) family the mos...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Information Systems
سال: 2023
ISSN: ['0306-4379', '1873-6076']
DOI: https://doi.org/10.1016/j.is.2022.102126